Large-Sample Hydrology to Improve Prediction
in Ungauged Basin using Information Theory
Techniques
Surface Hydrology and Watershed Management (TS8)
20th December 2024, 11:55 AM to 01:00 PM (Venue: CT Hall )
HYDRO 2024 INTERNATIONAL
29th International Conference on Hydraulics, Water Resources, River & Coastal Engineering
Central Water &Power Research Station, Pune, India
Dr. Ankit Deshmukh
School of Technology, PDEU.
ankit.Deshmukh@sot.pdpu.ac.in
ankitdeshmukh.com | anixn
2
I will be talking about
Prediction in Ungauged Basins (PUB)
Treading space for time approach
Large sample hydrology with TSFT
Catchment similarity for PUB
3
Hydrologic modeling is channeling in data scares region
Majority of the Basin/catchments in India are ungauged or the availability of the streamflow is not
suitable for reliable streamflow projection [Kumar et al., 2018; Sivapalan et al., 2017].
Climate change adding further projection uncertainty [Adler et al., 2003, Brohanetal 2006]
4
Prediction in Ungauged Basins (PUB)
Prediction in the ungauged basin is a paramount problem in water resource management as
observed discharge is a key parameter of all the hydrological paradigm.
PUB refers to estimating hydrological behaviors (streamflow and runoff), in river basins where
direct measurements are unavailable.
Several approaches are produced in the predicting runoff in ungauged basin or provide a
workaround [Sivapalan, M. (2003)].
Problem still persist due to lack of data availability, Regionalization & Transferability, Modeling
Complexities, Climate Change Impacts [Blöschl et al., 2013; Hrachowitz et al., 2013, Feng et al.,
2020]
Best practices for predictions in ungauged basins
by Takeuchi et al. (2013) in Blöschl et al. (2013)
5
Trading Space for Time (TSFT)
One of the recently established approaches is trading space for time [TSFT] . [Singh et.al, 2011 ,
Deshmukh and singh, 2016, 2019;].
Where we utilize large number of catchment information to overcome the low availability of other
parameters or discharge data.
Several climate scenario are used to explore the possible unprecedented changes in the future.
A trading-space-for-time approach to probabilistic continuous streamflow
predictions in a changing climateaccounting for changing watershed behaviour.
Singh et.al, 2011
Deshmukh and singh, 2019
Deshmukh and singh, 2016
6
Large sample hydrology (LHS) transferable hydrological models
- To deal with PUB problem we can leverages extensive datasets across a wide range of spatial and
temporal scales to uncover patterns.
- improving regionalization techniques for ungauged basins.
- advancing hydrological models to handle climate change induced extreme events. [Vogel et al.,
2003, Addor et al., 2017]
Recently available vast number of discharge data and catchment attribute data (CAMELS) allow us to
for testing the hypothesis on TSFT.
Average flow
condition
Low flow
conditions
High flow
conditions
Magnitude of flow events
46 22 27
Timing of flow events
03 03 03
Rate of change in flow events
09 00 00
Frequency of flow events
00 03 11
Duration of flow events
00 20 24
Olden, J. D., & Poff, N. L. (2003)
7
Using LSH to determined the similarity in the catchment groups
Recently availability vast number of discharge data and catchment attribute data (CAMELS) provides
useful link to relate the catchments
It is helps for Indian catchment where a link can be generated between ungauged and gauged
catchment based on similar grouping/ clusters of catchments.
We provide a simple framework to assess the similarity in the catchments:
8
Study area and data
CARAVAN1give access of more then
6000 catchments2.
Caravan - A global community dataset
for large-sample hydrology [Kratzert et
al., 2024]
We select 4 country for this study and
choose 50 catchment from each based
on the catchment area size [500-
2500km2]
___
1: Catchment Attributes and MEteorology for
Large sample Studies
1: 482 CAMELS1(US) , 150 CAMELS-AUS, 376
CAMELS-BR, 314 CAMELS-CL, 408 CAMELS-GB,
4621 HYSETS, 479 LamaH-CE
9
Study area and data for analysis
10
Methodology
We select 4 country (Australia, Chile, India, USA) for this study and choose 50 catchment from each
based on the catchment area size [500-2500km2]
Allow to find the similar catchment base on hydrologically similarity
Divide the catchments in characteristic into 3 groups Grouping PA, CA, HA
Find spatial similarity based on clustering (elbow method: 5 clusters for all the grouping.)
11
Hopkins statistic is used to find the cluster tendency of the data
The Hopkins statistic is a way of measuring the cluster tendency of a data set.
Computed Hopkins statistic for random data 0.5. With the data we have it near 1 (Clusterable) [Hopking et al.,
1954]
12
Cluster plot for CA and PA groups.
The clustering of the CA and PA grouping is shown in the figure. An ideal number of clusters (n = 5) are found using
gap statistics and elbow method.
13
Evaluate cluster with variable gourping
Internal similarity criteria: to attain high intra-cluster and low inter-cluster similarity.
External criteria of clustering quality:
Rand Index
Purity
F-measure
Normalize Mutual Information (NMI)
Normalize Mutual Information: NMI compares how much
information random variables (Cluster Vectors) share.
Entropy in information theory is a measure of uncertainty
or randomness in a set of data.
I: is mutual information
H: is entropy
Image Source: https://www.pngaaa.com/download/1559450
14
Normalized mutual information and correlation between CA, PA, and HI.
Several combination of CA PA HI will be created and optimize with highest NMI values. One of set is
show in the table below.
We can conclude with the above table that CA and HI grouping 25% explains each other, similarly, this number
is 40% for PA and HI grouping. We found strong location bias in the clustering
of the catchment.
15
Thank you!
Questions?
16
www. ankitdeshmukh.com
About my research
oMy fields of interest are:
oComputational Hydrology,
oWater resource management
oUnderstating the catchment response under
anthropogenic changes.
My specialization is on: “The approaches to identify the
catchment vulnerability to environmental changes.
My current research focuses on the development of a
Physio-climatic catchment characteristics dataset for
the Indian subcontinent that can be utilized for
prediction in the ungauged basins. I possess a strong
understanding of GIS processing and am efficient in
Geo-spatial analysis.
I am highly motivated in the field of data analysis
(finding meaningful insights in data and ML), skilled in
programming with R, Python, and SQL scripting.
Reach out to me:
Dr. Ankit Deshmukh, D-block
PDEU C7 2nd Floor, Gandhinagar
/ankit-deshmukh-2
/anix7n
/anixn
17
Reference
Sivapalan, M. (2003). "Prediction in Ungauged Basins: A Grand Challenge for Theoretical Hydrology." Hydrological Processes, 17(15), 3163-3170.
Blöschl, G., Sivapalan, M., Wagener, T., Viglione, A., & Savenije, H. (Eds.). (2013). Runoff Prediction in Ungauged Basins: Synthesis Across Processes, Places, and Scales.
Cambridge University Press.
Hrachowitz, M., Savenije, H. H. G., Blöschl, G., McDonnell, J. J., Sivapalan, M., Pomeroy, J. W., ... & Wagener, T. (2013). "A Decade of Predictions in Ungauged Basins
(PUB)A Review." Hydrological Sciences Journal, 58(6), 1198-1255.
Feng, D., Lawson, K., & Shen, C. (2020). "Prediction in Ungauged Regions with Sparse Flow Duration Curves and Input-Selection Ensemble Modeling." arXiv preprint
arXiv:2011.13380.
Willard, J. D., Varadharajan, C., Jia, X., & Kumar, V. (2023). "Time Series Predictions in Unmonitored Sites: A Survey of Machine Learning Techniques in Water
Resources." arXiv preprint arXiv:2308.09766.
Kratzert, F., et al. Caravan - A Global Community Dataset for Large-sample Hydrology. 1.4, Zenodo, 16 Apr. 2024, doi:10.5281/zenodo.10968468.
Vogel, R. M., & Sankarasubramanian, A. (2003). "Validation of a Watershed Model Without Calibration." Water Resources Research, 39(10), 1-14.
Addor, N., Newman, A. J., Mizukami, N., & Clark, M. P. (2017). "The CAMELS Data Set: Catchment Attributes and Meteorology for Large-Sample Studies." Hydrology and
Earth System Sciences, 21(10), 52935313.
Olden, J. D., & Poff, N. L. (2003). Redundancy and the choice of hydrologic indices for characterizing streamflow regimes. River Research and Applications, 19(2), 101
121. https://doi.org/10.1002/rra.700
Hopkins, Big D Randy; Skellam, Harry Kimmel I Gordon (1954). "A new method for determining the type of distribution of plant individuals". Annals of Botany.18 (2). Annals
Botany Co: 213227. doi:10.1093/oxfordjournals.aob.a083391.
Kumar, S., & Kaur, R. (2018). Hydrological Modeling of the Ganga Basin. In S. Kumar et al. (Eds.), Hydrology in Practice. Springer.
DOI: 10.1007/978-3-319-91548-6
Sivapalan, M., & Blöschl, G. (2017). The growth of hydrological understanding: Technologies, ideas, and societal needs shape the field. Water Resources Research,
53(10), 81378146. DOI: 10.1002/2017WR021396